email classification
Email Spam Detection Using Hierarchical Attention Hybrid Deep Learning Method
Zavrak, Sultan, Yilmaz, Seyhmus
Email is one of the most widely used ways to communicate, with millions of people and businesses relying on it to communicate and share knowledge and information on a daily basis. Nevertheless, the rise in email users has occurred a dramatic increase in spam emails in recent years. Processing and managing emails properly for individuals and companies are getting increasingly difficult. This article proposes a novel technique for email spam detection that is based on a combination of convolutional neural networks, gated recurrent units, and attention mechanisms. During system training, the network is selectively focused on necessary parts of the email text. The usage of convolution layers to extract more meaningful, abstract, and generalizable features by hierarchical representation is the major contribution of this study. Additionally, this contribution incorporates cross-dataset evaluation, which enables the generation of more independent performance results from the model's training dataset. According to cross-dataset evaluation results, the proposed technique advances the results of the present attention-based techniques by utilizing temporal convolutions, which give us more flexible receptive field sizes are utilized. The suggested technique's findings are compared to those of state-of-the-art models and show that our approach outperforms them.
Email Classification into relevant labels using Neural Networks
In the real world, many online shopping websites or service provider have single email-id where customers can send their query, concern etc. At the back-end service provider receive million of emails every week, how they can identify which email is belonged of a particular department? This paper presents an artificial neural network (ANN) model that is used to solve this problem and experiments are carried out on user personal Gmail emails datasets. This problem can be generalised as typical Text Classification or Categorization [8]. Electronic mail or e-mail is a method of electronic communication between two or more users using the Internet.
Interested in email classification, not sure how to approach • r/textdatamining
I'm working with some friends on an idea for email classification and we're wondering what would be the best way to approach the problem. Essentially we're looking to create an application/Outlook extension that would classify emails into various categories like "Important/Not Important" or "Project email, Contract talks, Trash", we're not totally sure on categories at the moment, if it could be user defined it would be more useful I guess. How could one approach such a problem, is text-mining the right approach or should be we looking into AI/Machine Learning techniques or a combination of the two? I read a bit about Bayesian Probabilities and how using previous results sets you get a matrix table of probabilities and that's used to determine where new data would be categories. Is this the best approach or are there alternatives we should be looking at?
Effect of Part-of-Speech and Lemmatization Filtering in Email Classification for Automatic Reply
Bonatti, Rogerio (Universidade de Sao Paulo) | Paula, Arthur G. de (Universidade de Sao Paulo) | Lamarca, Victor S. (Universidade de Sao Paulo) | Cozman, Fabio G. (Universidade de Sao Paulo)
We study the automatic reply of email business messages in Brazilian Portuguese. We present a novel corpus containing messages from a real application, and baseline categorization experiments using Naive Bayes and Support Vector Machines. We then discuss the effect of lemmatization and the role of part-of-speech tagging filtering on precision and recall. Support Vector Machines classification coupled with non-lemmatized selection of verbs and nouns, adjectives and adverbs was the best approach, with 87.3% maximum accuracy. Straightforward lemmatization in Portuguese led to the lowest classification results in the group, with 85.3% and 81.7% precision in SVM and Naive Bayes respectively. Thus, while lemmatization reduced precision and recall, part-of-speech filtering improved overall results.